Generative Adversarial Network, GAN

Model

$$ \begin{aligned} \epsilon \sim p_{\epsilon} \to \boxed{\mathcal{G}(\epsilon; \theta)} \to \tilde{x} \sim p_g \to &\boxed{\mathcal{D}(x; \psi)} \to P[\text{Real}] \\ &\uparrow \\ x &\sim p_r \end{aligned} $$

Sometimes noted as: $ p_r = p, p_g = q, D(x; \psi) = P[\text{Real}], f(\epsilon; \theta) = G(\epsilon, \theta)$.
$ \theta, \psi $ are neural network's weights, learned by backpropagation.

Loss Function

$$ \begin{aligned} &\arg\min_{\mathcal{G}} \arg\max_{\mathcal{D}} \mathbb{E}_{x \sim p_r}[\log D(x; \psi)] + \mathbb{E}_{\tilde{x} \sim p_g}[\log(1 - D(\tilde{x}; \psi))] \\ &\arg\min_{\theta} \arg\max_{\psi} \mathbb{E}_{x \sim p_X}[\log D(x; \psi)] + \mathbb{E}_{\epsilon \sim p_{\epsilon}}[\log(1 - D(f(\epsilon; \theta); \psi))] \\ \end{aligned} $$

Training

$$ \begin{aligned} &\nabla_{\psi} \frac{1}{M} \sum_{m=1}^{M} \left[ \log D(x_m; \psi) + \log(1 - D(f(\epsilon_m; \theta); \psi)) \right] \\ &\nabla_{\theta} \frac{1}{M} \sum_{m=1}^{M} \log(1 - D(f(\epsilon_m; \theta); \psi)) \end{aligned} $$

$ p_X $: empirical distribution, $ p_\epsilon$: uniform distribution.

Minibatch of real samples: $ x_1, \dots, x_M \sim p_X $,

Minibatch of noise samples: $\epsilon_1, \dots, \epsilon_M \sim p_{\epsilon} $.
$ \theta $ is trained such that $ p_g $ gets closer and closer to $ p_r $.

Notice

Train generator and discriminator in separate step! Make sure that the discriminator always gets the result from a better version of the generator! Otherwise the train speed will be greatly compromised.